[pathfinder] Use LOAD_WITH_ALTERED_SEARCH_PATH for system DLL search on Windows#1506
[pathfinder] Use LOAD_WITH_ALTERED_SEARCH_PATH for system DLL search on Windows#1506rwgk wants to merge 5 commits intoNVIDIA:mainfrom
LOAD_WITH_ALTERED_SEARCH_PATH for system DLL search on Windows#1506Conversation
…n Windows When loading CUDA DLLs via system search on Windows, the previous approach using LoadLibraryExW with flags=0 would find the DLL on PATH but fail to locate its co-located dependencies (error 126). This fix uses SearchPathW to first find the DLL's full path, then loads it with LOAD_WITH_ALTERED_SEARCH_PATH so Windows searches for dependencies starting from the DLL's directory.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Archiving proof-of-concept code: # LoadLibraryExW_nvrtc64_130_0_dll.py
import ctypes
kernel32 = ctypes.WinDLL("kernel32", use_last_error=True)
dll_path = r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1\bin\x64\nvrtc64_130_0.dll"
LOAD_WITH_ALTERED_SEARCH_PATH = 0x8
print("Trying with full path + LOAD_WITH_ALTERED_SEARCH_PATH...")
handle = kernel32.LoadLibraryExW(dll_path, None, LOAD_WITH_ALTERED_SEARCH_PATH)
if handle:
print(f"SUCCESS: handle={handle}")
else:
error = ctypes.get_last_error()
print(f"Error code: {error}")
print(f"Error message: {ctypes.FormatError(error)}") |
Made-with: Cursor # Conflicts: # cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_dl_windows.py
…_HOME/CUDA_PATH) Loads nvrtc in a subprocess with CUDA_HOME and CUDA_PATH stripped from the environment. On Windows CI where nvrtc is only reachable via PATH, this exercises the LOAD_WITH_ALTERED_SEARCH_PATH fix; on other platforms the test passes harmlessly via whatever search path finds nvrtc first. Made-with: Cursor
|
NOTE: In the below, Sanity check: Reproduce original problem with the latest cuda-python Then git switching to this PR (everything else exactly equal; PR currently based on same Now moving This still works: This also works: But not when git switching back to With After git switching back to this PR: |
Cursor (Claude 4.6 Opus (Thinking)) AnalysisSummaryThe experiments posted in PR #1506 comment reveal that the original PR description misdiagnoses the root cause. The real bug is that Evidence
Root causeSince Python 3.8, CPython calls
PATH is excluded entirely. The CUDA DLLs live in Why the PR fix worksThe fix has two parts, both necessary:
What the PR description should sayBefore (incorrect):
After (corrected):
Impact on testingThe regression test ( |
Also load nvrtc with the normal environment first. If the normal load finds nvrtc in a directory on PATH but the CUDA_HOME/CUDA_PATH-stripped load fails, the test now fails instead of skipping — directly catching the Windows bug where SetDefaultDllDirectories excludes PATH from LoadLibraryExW. Made-with: Cursor
|
To ensure the hardened test (commit b2aabfe) works as intended: NOTE: Initially Now setting |
…agnosis SearchPathW is the primary fix (bypasses Python 3.8+'s SetDefaultDllDirectories restriction); LOAD_WITH_ALTERED_SEARCH_PATH is a secondary benefit for dependency resolution. Made-with: Cursor
|
/ok to test |
|
|
Independent Cursor(GPT-5.4 Extra High) Review Findings
Assumptions
Brief
|
Cursor Claude 4.6 Opus (Thinking) analysis of the GPT-5.4 review and the alternative PR #1794:It raises two valid points about the
The alternative (
|
|
Back to Cursor GPT-5.4 Extra High User prompt
Assistant responseMy take: I would not merge either code change. The newest comment lands in the right place. On Windows, this behavior is coming from CPython’s hardened DLL-loading setup, not from a random pathfinder mistake. Both fixes are really trying to re-introduce Given your actual support story, “document, don’t fix” seems strongest:
So I’d frame this as “the docs/expectations are wrong,” not “the loader is broken.” In particular, the current public search-order text in If you go docs-only, I’d also change the new test in My recommendation:
If you want, I can make that documentation-only change set and replace the test with something that matches the documented Windows limitation. |
Fixes #1781
Problem
On Windows, Python 3.8+ calls
SetDefaultDllDirectories(LOAD_LIBRARY_SEARCH_DEFAULT_DIRS)at process startup. This restrictsLoadLibraryExW(name, NULL, 0)to searching only:System32)AddDllDirectory()PATHis excluded entirely. This means the pathfinder'sload_with_system_search()silently fails for all CUDA Toolkit DLLs whenCUDA_HOME/CUDA_PATHare not set, even though the DLLs are onPATH(e.g.C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin\x64\).The bug went unnoticed because our Windows test environments always have
CUDA_HOMEorCUDA_PATHset, so thePATH-based fallback was never exercised.Fix
Replace the bare
LoadLibraryExW(dll_name, None, 0)call with a two-step approach:SearchPathWto locate the DLL's full path.SearchPathWis not affected bySetDefaultDllDirectoriesand always searchesPATH.LoadLibraryExW(full_path, None, LOAD_WITH_ALTERED_SEARCH_PATH)to load the DLL by absolute path.LOAD_WITH_ALTERED_SEARCH_PATHadditionally tells Windows to resolve the DLL's dependencies starting from its own directory, which is a useful secondary benefit for DLLs with co-located dependencies (e.g. nvrtc + nvrtc-builtins).Test
Adds
test_load_nvrtc_without_cuda_home_or_cuda_path, which loads nvrtc in two fresh subprocesses:CUDA_HOMEandCUDA_PATHstripped.PATHbut the stripped load failed, the test fails — directly catching the bug.Verified on Windows (without the fix applied to
load_dl_windows.py):LOAD_WITH_ALTERED_SEARCH_PATHfor system DLL search on Windows #1506 (comment)CUDA_PATHset?main)CUDA_PATH, dir is onPATH, stripped load fails)SearchPathWfinds it viaPATH)